home *** CD-ROM | disk | FTP | other *** search
Text File | 2000-05-25 | 74.4 KB | 1,861 lines |
- ::/ \::::::.
- :/___\:::::::.
- /| \::::::::.
- :| _/\:::::::::.
- :| _|\ \::::::::::. Oct/Nov 98
- :::\_____\::::::::::. Issue 1
- ::::::::::::::::::::::.........................................................
-
- A S S E M B L Y P R O G R A M M I N G J O U R N A L
- http://asmjournal.freeservers.com
- asmjournal@mailcity.com
-
-
-
-
- T A B L E O F C O N T E N T S
- ----------------------------------------------------------------------
- Introduction...................................................mammon_
-
- "VGA Programming in Mode 13h".............................Lord Lucifer
-
- "SMC Techniques: The Basics"...................................mammon_
-
- "Going Ring0 in Windows 9x".....................................Halvar
-
- Column: Win32 Assembly Programming
- "The Basics"..............................................Iczelion
- "MessageBox"..............................................Iczelion
-
- Column: The C standard library in Assembly
- "_itoa, _ltoa and _ultoa"...................................Xbios2
-
- Column: The Unix World
- "x86 ASM Programming for Linux"............................mammon_
-
- Column: Issue Solution
- "11-byte Solution"..........................................Xbios2
- ----------------------------------------------------------------------
- +++++++++++++++++++++++Issue Challenge++++++++++++++++++++
- Write a program that displays its command line in 11 bytes
- ----------------------------------------------------------------------
-
-
-
-
- ::/ \::::::.
- :/___\:::::::.
- /| \::::::::.
- :| _/\:::::::::.
- :| _|\ \::::::::::.
- :::\_____\:::::::::::..............................................INTRODUCTION
- by mammon_
-
-
- Welcome to the first issue of Assembly Programming Journal. Assembly language
- has become of renewed interest to a lot of programmers, in what must be a
- backlash to the surge of poor-quality RAD-developed programs (from Delphi, VB,
- etc) released as free/shareware over the past few years. Assembly language
- code is tight, fast, and often well-coded -- you tend to find fewer
- inexperienced coders writing in assembly language than you do writing in, say,
- Visual Basic.
-
- The selection of articles is somewhat eclectic and should demonstrate the
- focus of this magazine: i.e., it targets the assembly-language programming
- community, not any particular type of coding such as Win32, virus, or demo
- programmimg. As the magazine is newly born and much of its purpose may seem
- unclear, I will devote the rest of this column to the most common questions I
- have received via email regarding the mag.
-
-
- How often will an issue be released?
- ------------------------------------
- Barring hazard, an issue will be released every other month.
-
-
- What types of articles will be accepted?
- ----------------------------------------
- Anything to do with assembly language. Obviously repeats of previously
- presented material are not necessary unless they enhance or clarify the
- earlier material. The focus will be on Intel x86 instruction sets; however
- coding for other processors is acceptable (though out of courtesy it would be
- good point to an x86 emulator for the processor you write on).
-
- Personally I am looking for articles on the areas of asembly language that
- interest me: code optimization, demo/graphics programming, virus coding, unix
- and other-OS asm coding, and OS-internals.
-
- Demos (with source) and quality ASCII art (for issue covers, column logos,
- etc) are especially welcome.
-
-
- For what level of coding experience is the mag intended?
- --------------------------------------------------------
- The magazine is intended to appeal to asm coders of all levels. Each issue
- will contain mostly beginner and intermediate level code/techniques, as these
- will by nature be of the greatest demand; however one of the goals of APJ is
- to include enough advanced material to make the magazine appeal to "pros" as
- well.
-
-
- How will the mag be distributed?
- --------------------------------
- Assembly Programming Journal has its own web page at
- http://asmjournal.freeservers.com
- which will contain the current issue and an archive of previous issues. The
- page also contains a guestbook and a disucssion board for article writers and
- readers.
-
- An email subscription may be obtained by sending an email to
- asmjournal@mailcity.com
- with the subject "SUBSCRIBE"; starting with the next issue, Assembly
- Programming Journal will be emailed to the address you sent the mail from.
-
-
- Wrap-up
- -------
- That's the bulk of the "faq". Enjoy the mag!
-
-
- ::/ \::::::.
- :/___\:::::::.
- /| \::::::::.
- :| _/\:::::::::.
- :| _|\ \::::::::::.
- :::\_____\:::::::::::...........................................FEATURE.ARTICLE
- VGA Programming in Mode 13h
- by Lord Lucifer
-
-
- This article will describe how to program VGA graphics Mode 13h using assembly
- language. Mode 13h is the 320x200x256 graphics mode, and is fast and very
- convenient from a programmer's perspective.
-
- The video buffer begins at address A000:0000 and ends at address A000:F9FF.
- This means the buffer is 64000 bytes long and that each pixel in mode 13h is
- represented by one byte.
-
- It is easy to set up mode 13h and the video buffer in assembly language:
-
- mov ax,0013h ; Int 10 - Video BIOS Services
- int 10h ; ah = 00 - Set Video Mode
- ; al = 13 - Mode 13h (320x200x256)
-
- mov ax,0A000h ; point segment register es to A000h
- mov es,ax ; we can now access the video buffer as
- ; offsets from register es
-
- At the end of your program, you will probably want to restore the text mode.
- Here's how:
-
- mov ax,0003h ; Int 10 - Video BIOS Services
- int 10h ; ah = 00 - Set Video Mode
- ; al = 03 - Mode 03h (80x25x16 text)
-
- Accessing a specific pixel int the buffer is also very easy:
-
- ; bx = x coordinate
- ; ax = y coordinate
- mul 320 ; multiply y coord by 320 to get row
- add ax,bx ; add this with the x coord to get offset
-
- mov cx,es:[ax] ; now pixel x,y can be accessed as es:[ax]
-
- Hmm... That was easy, but that multiplication is slow and we should get rid of
- it. That's easy to do too, simply by using bit shifting instead of multiplica-
- tion. Shifting a number to the left is the same as multiplying by 2. We want to
- multiply by 320, which is not a multiple of 2, but 320 = 256 + 64, and 256 and
- 64 are both even multiples of 2. So a faster way to access a pixel is:
-
- ; bx = x coordinate
- ; ax = y coordinate
- mov cx,bx ; copy bx to cx, to save it temporatily
- shl cx,8 ; shift left by 8, which is the same as
- ; multiplying by 2^8 = 256
- shl bx,6 ; now shift left by 6, which is the same as
- ; multiplying by 2^6 = 64
- add bx,cx ; now add those two together, whis is
- ; effectively multiplying by 320
- add ax,bx ; finally add the x coord to this value
- mov cx,es:[ax] ; now pixel x,y can be accessed as es:[ax]
-
- Well, the code is a little bit longer and looks more complicated, but I can
- guarantee it's much faster.
-
- To plot colors, we use a color look-up table. This look-up table is a 768
- (3x256) array. Each index of the table is really the offset index*3. The 3
- bytes at each index hold the corresponding values (0-63) of the red, green,
- and blue components. This gives a total of 262144 total possible colors.
- However, since the table is only 256 elements big, only 256 different colors
- are possible at a given time.
-
- Changing the color palette is accomplished through the use of the I/O ports of
- the VGA card:
-
- Port 03C7h is the Palette Register Read port.
- Port 03C8h is the Palette Register Write port
- Port 03C9h is the Palette Data port
-
- Here is how to change the color palette:
-
- ; ax = palette index
- ; bl = red component (0-63)
- ; cl = green component (0-63)
- ; dl = blue component (0-63)
-
- mov dx,03C8h ; 03c8h = Palette Register Write port
- out dx,ax ; choose index
-
- mov dx,03C9h ; 03c8h = Palette Data port
- out dx,al
- mov bl,al ; set red value
- out dx,al
- mov cl,al ; set green value
- out dx,al
- mov dl,al ; set blue value
-
- Thats all there is to it. Reading the color palette is similar:
-
- ; ax = palette index
- ; bl = red component (0-63)
- ; cl = green component (0-63)
- ; dl = blue component (0-63)
-
- mov dx,03C7h ; 03c7h = Palette Register Read port
- out dx,ax ; choose index
-
- mov dx,03C9h ; 03c8h = Palette Data port
- in al,dx
- mov bl,al ; get red value
- in al,dx
- mov cl,al ; get green value
- in al,dx
- mov dl,al ; get blue value
-
- Now all we need to know is how to plot a pixel of a certain color at a certain
- location. Its very easy, given what we already know:
-
- ; bx = x coordinate
- ; ax = y coordinate
- ; dx = color (0-255)
- mov cx,bx ; copy bx to cx, to save it temporatily
- shl cx,8 ; shift left by 8, which is the same as
- ; multiplying by 2^8 = 256
- shl bx,6 ; now shift left by 6, which is the same as
- ; multiplying by 2^6 = 64
- add bx,cx ; now add those two together, whis is
- ; effectively multiplying by 320
- add ax,bx ; finally add the x coord to this value
- mov es:[ax],dx ; copy color dx into memory location
- ; thats all there is to it
-
- Ok, we now know how to set up Mode 13h, set up the video buffer, plot a pixel,
- and edit the color palette.
-
- My next article will go on to show how to draw lines, utilize the vertical
- retrace for smoother rendering, and anything else I can figure out by that
- time...
-
-
- ::/ \::::::.
- :/___\:::::::.
- /| \::::::::.
- :| _/\:::::::::.
- :| _|\ \::::::::::.
- :::\_____\:::::::::::...........................................FEATURE.ARTICLE
- SMC Techniques: The Basics
- by mammon_
-
-
- One of the benefits of coding in assembly language is that you have the option
- to be as tricky as you like: the binary gymnastics of viral code demonstrate
- this above all else. One of the viral "tricks" that has made its way into
- standard protection schemes is SMC: self-modifying code.
-
- In this article I will not be discussing polymorphic viruses or mutation
- engines; I will not go into any specific software protection scheme, or cover
- any anti-debugger/anti-disassembler tricks, or even touch on the matter of the
- PIQ. This is intended to be a simple primer on self-modifying code, for those
- new to the concept and/or implementation.
-
-
- Episode 1: Opcode Alteration
- ----------------------------
- One of the purest forms of self-modifying code is to change the value of an
- instruction before it is executed...sometimes as the result of a comparison,
- and sometimes to hide the code from prying eyes. This technique essentially
- has the following pattern:
- mov reg1, code-to-write
- mov [addr-to-write-to], reg1
- where 'reg1' would be any register, and where '[addr-to-write-to]' would be a
- pointer to the address to be changed. Note that 'code-to-write- would ideally
- be an instruction in hexadecimal format, but by placing the code elsewhere in
- the program--in an uncalled subroutine, or in a different segment--it is
- possible to simply transfer the compiled code from one location to another via
- indirect addressing, as follows:
- call changer
- mov dx, offset [string] ;this will be performed but ignored
- label: mov ah, 09 ;this will never be perfomed
- int 21h ;this will exit the program
- ....
- changer: mov di, offset to_write ;load address of code-to-write in DI
- mov byte ptr [label], [di] ;write code to location 'label:'
- ret ;return from call
- to_write: mov ah, 4Ch ;terminate to DOS function
-
- this small routine will cause the program to exit, though in a disassembler it
- at first appears to be a simple print string routine. Note that by combining
- indirect addressing with loops, entire subroutines--even programs--can be
- overwritten, and the code to be written--which may be stored in the program as
- data--can be encrypted with a simple XOR to disguise it from a disassembler.
-
- The following is a complete asm program to demonstrate patching "live" code;
- it asks the user for a password, then changes the string to be printed
- depending on whether or not the password is correct:
- ; smc1.asm ==================================================================
- .286
- .model small
- .stack 200h
- .DATA
- ;buffer for Keyboard Input, formatted for easy reference:
- MaxKbLength db 05h
- KbLength db 00h
- KbBuffer dd 00h
-
- ;strings: note the password is not encrypted, though it should be...
- szGuessIt db 'Care to guess the super-secret password?',0Dh,0Ah,'$'
- szString1 db 'Congratulations! You solved it!',0Dh,0Ah, '$'
- szString2 db 'Ah, damn, too bad eh?',0Dh,0Ah,'$'
- secret_word db "this"
-
- .CODE
- ;===========================================
- start:
- mov ax,@data ; set segment registers
- mov ds, ax ; same as "assume" directive
- mov es, ax
- call Query ; prompt user for password
- mov ah, 0Ah ; DOS 'Get Keyboard Input' function
- mov dx, offset MaxKbLength ; start of buffer
- int 21h
- call Compare ; compare passwords and patch
- exit:
- mov ah,4ch ; 'Terminate to DOS' function
- int 21h
- ;===========================================
- Query proc
- mov dx, offset szGuessIt ; Prompt string
- mov ah, 09h ; 'Display String' function
- int 21h
- ret
- Query endp
- ;===========================================
- Reply proc
- PatchSpot:
- mov dx, offset szString2 ; 'You failed' string
- mov ah, 09h ; 'Display String' function
- int 21h
- ret
- Reply endp
- ;===========================================
- Compare proc
- mov cx, 4 ; # of bytes in password
- mov si, offset KbBuffer ; start of password-input in Buffer
- mov di, offset secret_word ; location of real password
- rep cmpsb ; compare them
- or cx, cx ; are they equal?
- jnz bad_guess ; nope, do not patch
- mov word ptr cs:PatchSpot[1], offset szString1 ;patch to GoodString
- bad_guess:
- call Reply ; output string to display result
- ret
- Compare endp
- end start
- ; EOF =======================================================================
-
-
- Episode 2: Encryption
- ---------------------
- Encryption is undoubtedly the most common form of SMC code used today. It is
- used by packers and exe-encryptors to either compress or hide code, by viruses
- to disguise their contents, by protection schemes to hide data. The basic
- format of encryption SMC would be:
- mov reg1, addr-to-write-to
- mov reg2, [reg1]
- manipulate reg2
- mov [reg1], reg2
- where 'reg1' would be a register containing the address (offset) of the
- location to write to, and reg2 would be a temporary register which loads the
- contents of the first and then modifies them via mathematical (ROL) or logical
- (XOR) operations. The address to be patched is stored in reg1, its contents
- modified within reg2, and then written back to the original location still
- stored in reg1.
-
- The program given in the preceding section can be modified so that it
- unencrypts the password by overwriting it (so that it remains unencrypted
- until the program is terminated) by first changing the 'secret_word' value as
- follows:
- secret_word db 06Ch, 04Dh, 082h, 0D0h
-
- and then by changing the 'Compare' routine to patch the 'secret_word' location
- in the data segment:
- ;===========================================
- magic_key db 18h, 25h, 0EBh, 0A3h ;not very secure!
-
- Compare proc ;Step 1: Unencrypt password
- mov al, [magic_key] ; put byte1 of XOR mask in al
- mov bl, [secret_word] ; put byte1 of password in bl
- xor al, bl
- mov byte ptr secret_word, al ; patch byte1 of password
- mov al, [magic_key+1] ; put byte2 of XOR mask in al
- mov bl, [secret_word+1] ; put byte2 of password in bl
- xor al, bl
- mov byte ptr secret_word[1], al ; patch byte2 of password
- mov al, [magic_key+2] ; put byte3 of XOR mask in al
- mov bl, [secret_word+2] ; put byte3 of password in bl
- xor al, bl
- mov byte ptr secret_word[2], al ; patch byte3 of password
- mov al, [magic_key+3] ; put byte4 of XOR mask in al
- mov bl, [secret_word+3] ; put byte4 of password in bl
- xor al, bl
- mov byte ptr secret_word[3], al ; patch byte4 of password
- mov cx, 4 ;Step 2: Compare Passwords...no changes from here
- mov si,offset KbBuffer
- mov di, offset secret_word
- rep cmpsb
- or cx, cx
- jnz bad_guess
- mov word ptr cs:PatchSpot[1], offset szString1
- bad_guess:
- call Reply
- ret
- Compare endp
-
- Note the addition of the 'magic_key' location which contains the XOR mask for
- the password. This whole thing could have been made more sophisticated with a
- loop, but with only four bytes the above speeds debugging time (and, thereby,
- article-writing time). Note how the password is loaded, XORed, and re-written
- one byte at a time; using 32-bit code, the whole (dword) password could be
- written, XORed and an re-written at once.
-
-
- Episode 3. Fooling with the stack
- ---------------------------------
- This is a trick I learned while decompiling some of SunTzu's code. What
- happens here is pretty interesting: the stack is moved into the code segment
- of the program, such that the top of the stack is set to the first address to
- be patched (which, BTW, should be the one closest to the end of the program
- due to the way the stack works); the byte at this address is the POPed into a
- register, manipulated, and PUSHed back to its original location. The stack
- pointer (SP) is then decremented so that the next address to be patched (i
- byte lower in memory) is now at the top of the stack.
-
- In addition, the bytes are being XORed with a portion of the program's own
- code, which disguises somewhat the actual value of the XOR mask. In the
- following code, I chose to use the bytes from Start: (200h when compiled)
- up to --but not including-- Exit: (214h when compiled; Exit-1 = 213h).
- However, as with SunTzu's original code I kept the "reverse" sequence of the
- XOR mask such that byte 213h is the first byte of the XOR mask, and byte 200h
- is the last. After some experimentation I found this was the easiest way to
- sync a patch program--or a hex editor--to the stack-manipulative code; since
- the stack moves backwards (a forward-moving stack is more trouble than it is
- worth), using a "reverse" XOR mask allows both filepointers in a patcher to be
- INCed or DECed in sync.
-
- Why is this an issue? Unlike the previous two examples, the following does not
- contain the encrypted version of the code-to-be-patched. It simply contains
- the source code which, when compiled, results in the unencrypted bytes which
- are then run through the XOR routine, encrypted, and then executed (which, if
- you have followed thus far, will immediately demonstrate to be no good...
- though it is a fantastic way of crashing the DOS VM!).
-
- Once the program is compiled you must either patch the bytes-to-be-decrypted
- manually, or write a patcher to do the job for you. The former is more
- expedient, the latter is more certain and is a must if you plan on maintaining
- the code. In the following example I have embedded 2 CCh's (Int3) in the code
- at the fore and aft end of the bytes-to-be-decrypted section; a patcher need
- simply search for these, count the bytes in between, and then XOR with the
- bytes between 200-213h.
-
- Once again, this sample is a continuation of the previous example. In it, I
- have written a routine to decrypt the entire 'Compare' routine of the previous
- section by XORing it with the bytes between 'Start' and 'Exit'. This is
- accomplished by seeting the stack segment equal to the code segment, then
- setting the stack pointer equal to the end (highest) address of the code to be
- modified. A byte is POPed from the stack (i.e. it's original location), XORed,
- and PUSHed back to its original location. The next byte is loaded by
- decrementing the stack pointer. Once all of the code it decrypted, control is
- returned to the newly-decrypted 'Compare' routine and normal execution
- resumes.
-
- ;===========================================
- magic_key db 18h, 25h, 0EBh, 0A3h
-
- Compare proc
- mov cx, offset EndPatch[1] ;start addr-to-write-to + 1
- sub cx, offset patch_pwd ;end addr-to-write-to
- mov ax, cs
- mov dx, ss ;save stack segment--important!
- mov ss, ax ;set stack segment to code segment
- mov bx, sp ;save stack pointer
- mov sp, offset EndPatch ;start addr-to-write-to
- mov si, offset Exit-1 ;start sddr of XOR mask
- XorLoop:
- pop ax ;get byte-to-patch into AL
- xor al, [si] ;XOR al with XorMask
- push ax ;write byte-to-patch back to memory
- dec sp ;load next byte-to-patch
- dec si ;load next byte of XOR mask
- cmp si, offset Start ;end sddr of XOR mask
- jae GoLoop ;if not at end of mask, keep going
- mov si, offset Exit-1 ;start XOR mask over
- GoLoop:
- loop XorLoop ;XOR next byte
- mov sp, bx ;restore stack pointer
- mov ss, dx ;restore stack segment
- jmp patch_pwd
- db 0CCh,0CCh ;Identifcation mark: START
- patch_pwd: ;no changes from here
- mov al, [magic_key]
- mov bl, [secret_word]
- xor al, bl
- mov byte ptr secret_word, al
- mov al, [magic_key+1]
- mov bl, [secret_word+1]
- xor al, bl
- mov byte ptr secret_word[1], al
- mov al, [magic_key+2]
- mov bl, [secret_word+2]
- xor al, bl
- mov byte ptr secret_word[2], al
- mov al, [magic_key+3]
- mov bl, [secret_word+3]
- xor al, bl
- mov byte ptr secret_word[3], al
- ;compare password
- mov cx, 4
- mov si, offset KbBuffer
- mov di, offset secret_word
- rep cmpsb
- or cx, cx
- jnz bad_guess
- mov word ptr cs:PatchSpot[1], offset szString1
- bad_guess:
- call Reply
- ret
- Compare endp
- EndPatch:
- db 0CCh, 0CCh ;Identification Mark: END
-
- This kind of program is very hard to debug. For testing, I substituted 'xor
- al, [si]' first with 'xor al, 00h', which would cause no encryption and is
- useful for testing code for final bugs, and then with 'xor al, EBh', which
- allowed me to verify that the correct bytes were being encrypted (it never
- hurts to check, after all).
-
-
- Episode 4: Summation
- --------------------
- That should demonstrate the basics of self-modifying code. There are a few
- techniques to consider to make development easier, though really any SMC
- programs will be tricky.
-
- The most important thing is to get your program running completely before you
- start overwriting any of its code segments. Next, always create a program that
- performs the reverse of any decryption/encryption code--not only does this
- speed up comilation and testing by automating the encryption of code areas
- that will be decrypted at runtime, it also provides a good tool for error
- checking using a disassembler (i.e. encrypt the code, disassemble, decrypt the
- code, disassemble, compare). In fact, it is a good idea to encapsulate the SMC
- portion of your program in a separate executable and test it on the compiled
- "release product" until all of the bugs are out of the decryption routine, and
- only then add the decryption routine to your final code. The CCh 'landmarks'
- (codemarks?) are extremely useful as well.
-
- Finally, do your debugging with debug.com for DOS applications--the debugger
- is quick, small, and if it crashes you simply lose a Windows DOS box. The
- ability to view the program address space after the program has terminated but
- before it is unloaded is another distinct advantage.
-
- More complex examples of SMC programs can be found in Dark Angel's code, the
- Rhince engine, or in any of the permutation engines used in ploymorphic
- viruses. Acknowledgements go to Sun-Tzu for the stack technique used in his
- ghf-crackme program.
-
-
- ::/ \::::::.
- :/___\:::::::.
- /| \::::::::.
- :| _/\:::::::::.
- :| _|\ \::::::::::.
- :::\_____\:::::::::::...........................................FEATURE.ARTICLE
- Going Ring0 in Windows 9x
- by Halvar Flake
-
-
- This article gives a short overview over two ways to go Ring0 in Windows 9x in
- an undocumented way, exploiting the fact that none of the important system
- tables in Win9x are on pages which are protected from low-privilege access.
-
- A basic knowledge of Protected Mode and OS Internals are required, refer to
- your Assembly Book for that :-) The techniques presented here are in no way a
- good/clean way to get to a higher privilege level, but since they require only
- a minimal coding effort, they are sometimes more desirable to implement than a
- full-fledged VxD.
-
- 1. Introduction
- ---------------
- Under all modern Operating Systems, the CPU runs in protected mode, taking
- advantage of the special features of this mode to implementvirtual memory,
- multitasking etc. To manage access to system-critical resources (and to thus
- provide stability) a OS is in need of privilege levels, so that a program can't
- just switch out of protected mode etc. These privilege levels are represented
- on the x86 (I refer to x86 meaning 386 and following) CPU by 'Rings', with
- Ring0 being the most privileged and Ring3 being the least privileged level.
- Theoretically, the x86 is capable of 4 privilege levels, but Win32 uses only
- two of them, Ring0 as 'Kernel Mode' and Ring3 as 'User Mode'.
-
- Since Ring0 is not needed by 99% of all applications, the only documented way
- to use Ring0 routines in Win9x is through VxDs. But VxDs, while being the only
- stable and recommended way, are work to write and big, so in a couple of
- specialized situations, other ways to go Ring0 are useful.
-
- The CPU itself handles privilege level transitions in two ways: Through
- Exceptions/Interrupts and through Callgates. Callgates can be put in the LDT or
- GDT, Interrupt-Gates are found in the IDT.
-
- We'll take advantage of the fact that these tables can be freely written to
- from Ring3 in Win9x (NOT IN NT !).
-
-
- 2. The IDT method
- -----------------
- If an exception occurs (or is triggered), the CPU looks in the IDT to the
- corresponding descriptor. This descriptor gives the CPU an Address and Segment
- to transfer control to. An Interrupt Gate descriptor looks like this:
-
- --------------------------------- ---------------------------------
- D D
- 1.Offset (16-31) P P P 0 1 1 1 0 0 0 0 R R R R R +4
- L L
- --------------------------------- ---------------------------------
- 2.Segment Selector 3.Offset (0-15) 0
- --------------------------------- ---------------------------------
- DPL == Two bits containing the Descriptor Privilege Level
- P == Present bit
- R == Reserved bits
-
- The first word (Nr.3) contains the lower word of the 32-bit address of the
- Exception Handler. The word at +6 contains the high-order word. The word at +2
- is the selector of the segment in which the handler resides.
-
- The word at +4 identifies the descriptor as Interrupt Gate, contains its
- privilege and the present bit. Now, to use the IDT to go Ring0, we'll create a
- new Interrupt Gate which points to our Ring0 procedure, save an old one and
- replace it with ours.
-
- Then we'll trigger that exception. Instead of passing control to Window's own
- handler, the CPU will now execute our Ring0 code. As soon as we're done, we'll
- restore the old Interrupt Gate.
-
- In Win9x, the selector 0028h always points to a Ring0-Code Segment, which spans
- the entire 4 GB address range. We'll use this as our Segment selector.
-
- The DPL has to be 3, as we're calling from Ring3, and the present bit must be
- set. So the word at +4 will be 1110111000000000b => EE00h. These values can
- be hardcoded into our program, we have to just add the offset of our Ring0
- Procedure to the descriptor. As exception, you should preferrably use one that
- rarely occurs, so do not use int 14h ;-)
-
- I'll use int 9h, since it is (to my knowledge) not used on 486+.
-
- Example code follows (to be compiled with TASM 5):
-
- -------------------------------- bite here -----------------------------------
-
- .386P
- LOCALS
- JUMPS
- .MODEL FLAT, STDCALL
-
- EXTRN ExitProcess : PROC
-
- .data
-
- IDTR df 0 ; This will receive the contents of the IDTR
- ; register
-
- SavedGate dq 0 ; We save the gate we replace in here
-
- OurGate dw 0 ; Offset low-order word
- dw 028h ; Segment selector
- dw 0EE00h ;
- dw 0 ; Offset high-order word
-
-
-
- .code
-
- Start:
- mov eax, offset Ring0Proc
- mov [OurGate], ax ; Put the offset words
- shr eax, 16 ; into our descriptor
- mov [OurGate+6], ax
-
- sidt fword ptr IDTR
- mov ebx, dword ptr [IDTR+2] ; load IDT Base Address
- add ebx, 8*9 ; Address of int9 descriptor in ebx
-
- mov edi, offset SavedGate
- mov esi, ebx
- movsd ; Save the old descriptor
- movsd ; into SavedGate
-
- mov edi, ebx
- mov esi, offset OurGate
- movsd ; Replace the old handler
- movsd ; with our new one
-
- int 9h ; Trigger the exception, thus
- ; passing control to our Ring0
- ; procedure
-
- mov edi, ebx
- mov esi, offset SavedGate
- movsd ; Restore the old handler
- movsd
-
- call ExitProcess, LARGE -1
-
- Ring0Proc PROC
- mov eax, CR0
- iretd
- Ring0Proc ENDP
-
- end Start
-
- -------------------------------- bite here -----------------------------------
-
-
- 3. The LDT Method
- -----------------
- Another possibility of executing Ring0-Code is to install a so- called callgate
- in either the GDT or LDT. Under Win9x it is a little bit easier to use the LDT,
- since the first 16 descriptors in it are always empty, so I will only give
- source for that method here.
-
- A Callgate is similar to a Interrupt Gate and is used in order to transfer
- control from a low-privileged segment to a high-privileged segment using a CALL
- instruction.
-
- The format of a callgate is:
-
- --------------------------------- ---------------------------------
- D D D D D D
- 1.Offset (16-31) P P P 0 1 1 0 0 0 0 0 0 W W W W +4
- L L C C C C
- --------------------------------- ---------------------------------
- 2.Segment Selector 3.Offset (0-15) 0
- --------------------------------- ---------------------------------
- P == Present bit
- DPL == Descriptor Privilege Level
- DWC == Dword Count, number of arguments copied to the ring0 stack
-
- So all we have to do is to create such a callgate, write it into one of the
- first 16 descriptors, then do a far call to that descriptor to execute our
- Ring0 code.
-
- Example Code:
-
- -------------------------------- bite here -----------------------------------
-
- .386P
- LOCALS
- JUMPS
- .MODEL FLAT, STDCALL
-
- EXTRN ExitProcess : PROC
-
- .data
-
- GDTR df 0 ; This will receive the contents of the IDTR
- ; register
-
- CallPtr dd 00h ; As we're using the first descriptor (8) and
- dw 0Fh ; its located in the LDT and the privilege level
- ; is 3, our selector will be 000Fh.
- ; That is because the low-order two bits of the
- ; selector are the privilege level, and the 3rd
- ; bit is set if the selector is in the LDT.
-
- OurGate dw 0 ; Offset low-order word
- dw 028h ; Segment selector
- dw 0EC00h ;
- dw 0 ; Offset high-order word
-
- .code
-
- Start:
- mov eax, offset Ring0Proc
- mov [OurGate], ax ; Put the offset words
- shr eax, 16 ; into our descriptor
- mov [OurGate+6], ax
-
- xor eax, eax
-
- sgdt fword ptr GDTR
- mov ebx, dword ptr [GDTR+2] ; load GDT Base Address
- sldt ax
- add ebx, eax ; Address of the LDT descriptor in
- ; ebx
- mov al, [ebx+4] ; Load the base address
- mov ah, [ebx+7] ; of the LDT itself into
- shl eax, 16 ; eax, refer to your pmode
- mov ax, [ebx+2] ; manual for details
-
- add eax, 8 ; Skip NULL Descriptor
-
- mov edi, eax
- mov esi, offset OurGate
- movsd ; Move our custom callgate
- movsd ; into the LDT
-
- call fword ptr [CallPtr] ; Execute the Ring0 Procedure
-
- xor eax, eax ; Clean up the LDT
- sub edi, 8
- stosd
- stosd
-
- call ExitProcess, LARGE -1
-
- Ring0Proc PROC
- mov eax, CR0
- retf
- Ring0Proc ENDP
-
- end Start
-
- -------------------------------- bite here -----------------------------------
-
- Well, that's all for now folks. This method can be easily changedto use the GDT
- instead which would save a few bytes in case you have to optimize heavily.
-
- Anyways, do use these methods with care, they will NOT run on NT and are
- generally not exactly a clean or stable way to do these things.
-
-
- Credits & Thanks
- ----------------
- The IDT-Method taken from the CIH virus & Stone's example source at
- http://www.cracking.net.
- The LDT-Method was done by me, but without IceMans & The_Owls help I would
- still be stuck, so all credits go to them.
-
-
- ::/ \::::::.
- :/___\:::::::.
- /| \::::::::.
- :| _/\:::::::::.
- :| _|\ \::::::::::.
- :::\_____\:::::::::::................................WIN32.ASSEMBLY.PROGRAMMING
- Win32 ASM: The Basics
- by Iczelion
-
-
- The required tools:
- -Microsoft Macro Assembler 6.1x : MASM support of Win32 programming
- starts from version 6.1. The latest version is 6.13 which
- is a patch to previous version of 6.11. Win98 DDK includes MASM
- 6.11d which you can download from Microsoft at
- http://www.microsoft.com/hwdev/ddk/download/win98ddk.exe
- But be warned, this monstrosity is huge, 18.5 MB in size. MASM 6.13
- patch can also be downloaded from
- ftp://ftp.microsoft.com/softlib/mslfiles/ml613.exe
- -Microsoft import libraries : You can use the import libraries from
- Visual C++. Some are included in Win98 DDK.
- -Win32 API Reference : You can download it from Borland's site:
- ftp://ftp.borland.com/pub/delphi/techpubs/delphi2/win32.zip
-
- Here's a brief description of the assembly process.
-
- MASM 6.1x comes with two essential tools: ml.exe and link.exe. ml.exe is the
- assembler. It takes in the assembly source code (.asm) and produces an object
- file (.obj) . An object file is an intermediate file between the source code
- and the executable file. It needs some address fixups which are the services
- provided by link.exe. Link.exe makes an object file into an executable file by
- several means such as adding the codes from other modules to the object files
- or providing the address fixups, addingr esouces, etc.
-
- For example:
- ml skeleton.asm ---> this produces skeleton.obj
- link skeleton.obj ---> this produces skeleton.exe
-
- The above lines are simplification of course. In the real world, you must add
- several switches to ml.exe and link.exe to customize your application. Also
- there will be several files you must link with the object file in order to
- create your application.
-
- Win32 programs run in protected mode which is available since 80286. But 80286
- is now history. So we only have to concern ourselves with 80386 and its
- descendants. Windows run each Win32 program in separated virtual space. That
- means each Win32 program will have its own 4 GB address space. Each program is
- alone in its address space. This is in contrast to the situation in Win16. All
- Win16 programs can *see* each other. Not so in Win32. This feature helps reduce
- the chance of one program writing over other program's code/data.
-
- Memory model is also drastically different from the old days of the 16-bit
- world. Under Win32, we need not be concerned with memory model or segment
- anymore! There's only one memory model: Flat memory model. There's no more 64K
- segments. The memory is a large continuous space of 4 GB. That also means you
- don't have to play with segment registers. You can use any segment register to
- address any point in the memory space. That's a GREAT help to programmers. This
- is what makes Win32 assembly programming as easy as C.
-
- We will examine a miminal skeleton of a Win32 assembly program. We'll add more
- flesh to it later. Here's the skeleton program. If you don't understand some of
- the codes, don't panic. I'll explain each of them later.
-
- .386
- .MODEL Flat, STDCALL
- .DATA
- <Your initialized data>
- ......
- .DATA?
- <Your uninitialized data>
- ......
- .CONST
- <Your constants>
- ......
- .CODE
- <label>
- <Your code>
- .....
- end <label>
- That's all! Let's analyze this skeleton program.
-
- .386
- This is an assembler directive, telling the assembler to use 80386 instruction
- set. You can also use .486, .586 but the safest bet is to stick to .386.
-
- .MODEL FLAT, STDCALL
- .MODEL is an assembler directive that specifies memory model of your program.
- Under Win32, there's only on model, FLAT model. STDCALL tells MASM about
- parameter passing convention. Parameter passing convention specifies the order
- of parameter passing, left-to-right or right-to-left, and also who will
- balance the stack frame after the function call.
-
- Under Win16, there are two types of calling convention, C and PASCAL C calling
- convention passes parameters to the function from right to left, that is , the
- rightmost parameter is pushed on the stack first. The caller is responsible for
- balancing the stack frame after the call. For example, in order to call a
- function named foo(int first_param, int second_param, int third_param) in C
- calling convention the asm codes will look like this:
-
- push [third_param] ; Push the third parameter
- push [second_param] ; Followed by the second
- push [first_param] ; And the first
- call foo
- add sp, 12 ; The caller balances the stack frame
-
- PASCAL calling convention is the reverse of C calling convention. It pushes
- parameters on the stack from left to right and the callee is responsible for
- the stack balancing after the call.
-
- Win16 adopts PASCAL convention because it produces smaller codes. C convention
- is useful when you don't know how many parameters will be passed to the
- function as in the case of wsprintf(). In the case of wsprintf(), the function
- has no way to determine beforehand how many parameters will be pushed on the
- stack, so it cannot balance the stack correctly. The caller is the one who
- knows how many bytes are pushed on the stack so it's right and proper that it's
- also the one who balances the stack frame after the call.
-
- STDCALL is the hybrid of C and PASCAL convention. It pushes parameters on the
- stack from right to left but the callee is responsible for stack balancing
- after the call. Win32 platform use STDCALL exclusively. Except in one case:
- wsprintf(). You must use C calling convention with wsprintf().
-
- .DATA
- .DATA?
- .CONST
- .CODE
- All four directives are what are called sections. You don't have segments in
- Win32 anymore, remember? But you can divide your entire address space into
- logical sections. The start of one section denotes the end of the previous
- section. There are two groups of section: data and code. Data sections are
- divided into 3 categories:
-
- * .DATA This section contains initialized data of your program.
- * .DATA? This section contains uninitialized data of your program.
- Sometimes you just want to preallocate some memory but doesn't want to
- initialize it. This section exists for that purpose.
- * .CONST This section contains declaration of constants used by your
- program. Constants in this section can never be modified in your
- program. They are just *constant*.
-
- You don't have to use all three sections in your program. Declare only the
- section(s) you want to use.
-
- There's only one section for code: .CODE. This is where your codes reside.
- Example:
-
- <label>
- end <label>
-
- ...where <label> is any arbitrary label is used to specify the extent of your
- code. Both labels must be identical. All your codes must reside between
- <label> and end <label>
-
-
- ::/ \::::::.
- :/___\:::::::.
- /| \::::::::.
- :| _/\:::::::::.
- :| _|\ \::::::::::.
- :::\_____\:::::::::::................................WIN32.ASSEMBLY.PROGRAMMING
- MessageBox Display
- by Iczelion
-
-
- We will create a fully functional Windows program that displays a message box
- saying "Win32 assembly is great!".
-
- Windows prepares a wealth of resources for use by Windows programs. Central to
- this is the Windows API (Application Programming Interface). Windows API is a
- huge collection of very useful functions that resides in Windows itself, ready
- to be used by any Windows programs.
-
- These functions are stored in several dynamic-linked libraries (DLLs) such as
- kernel32.dll, user32.dll and gdi32.dll, to name a few. Kernel32.dll contains
- API functions that deal with memory and process management. User32.dll controls
- the user interface aspects of your programs. Gdi32.dll is responsible for
- graphics operation. Other than "the main three", there are other DLLs that your
- program can make use of, provided you have enough information about the desired
- API functions stored in them.
-
- Windows programs dynamically link to these DLLs, i.e. the codes of API
- functions are not included in the executable file. This is very different from
- what's called static linking in which actual codes from software libraries are
- included in the executable files. In order for programs to know where to find
- the desired API functions at runtime, enough information must be embedded into
- the executable file for it to be able to select the correct DLLs and correct
- functions. That information is in import libraries. You must link your
- programs with the correct import libraries or it will not be able to locate
- the desired API functions.
-
- There are two types of API functions: One for ANSI and the other for Unicode.
- The name of API functions for ANSI are postfixed with "A", eg. MessageBoxA.
- Those for Unicode are postfixed with "W" (for Wide Char, I think).
-
- Windows 95 natively supports ANSI and Windows NT Unicode. But most of the time,
- you will use an include file which can determine and select the appropriate API
- functions for your platform. Just refer to the API function name without the
- postfix.
-
- I'll present the bare program skeleton below. We will fill it out later.
-
- .386
- .model flat, stdcall
- .data
- .code
- Main:
- end Main
-
- Every Windows program must call an API function, ExitProcess, when it wants to
- quit to Windows. In this respect, ExitProcess is equivalent to int 21h, ah=4Ch
- in DOS.
-
- Here's the function prototype of ExitProcess from winbase.h:
-
- void WINAPI ExitProcess(UINT uExitCode);
-
- -void means the function does not return any value to the caller.
- -WINAPI is an alias of STDCALL calling convention.
- -UINT is a data type, "unsigned integer", which is a 32-bit value under Win32
- (it's a 16-bit value under Win16)
- -uExitCode is the 32-bit return code to Windows. This value is not used by
- Windows as of now.
-
- In order to call ExitProcess from an assembly program, you must first declare
- the function prototype for ExitProcess.
-
- .386
- .model flat, stdcall
- ExitProcess PROTO :DWORD
- .data
- .code
- Main:
- invoke ExitProcess, 0
- end Main
-
- That's it. Your first working Win32 program. Save it under the name msgbox.asm.
- Assuming ml.exe is in your path, assemble msgbox.asm with:
-
- ml /c /coff /Cp msgbox.asm
-
- /c tells MASM to assemble the source file into an object file only. Do not
- invoke Link.exe automatically.
- /coff tells MASM to create .obj file in COFF format.
- /Cp tells MASM to preserve case of user identifiers
-
- Then go on with link:
-
- link /SUBSYSTEM:WINDOWS /LIBPATH:c:\masm\lib msgbox.obj
- kernel32.lib
-
- /SUBSYSTEM:WINDOWS informs Link.exe on which platform the executable is
- intended to run
- /LIBPATH:<path to import library> tells Link where the import libraries
- are. In my PC, they're located in c:\masm\lib.
-
- Now that you get msgbox.exe. Go on, run it. You'll find that it does nothing.
- Well, we haven't put anything interesting in it yet. But it's a Windows
- program nonetheless. And look at its size! In my PC, it is 1,536 bytes.
- The line:
-
- ExitProcess PROTO :DWORD
-
- is a function prototype. You create one by declaring the function name followed
- by the keyword "PROTO" and lists of data types of the parameters prefixed by
- colons. MASM uses function prototypes to type checking which will prevent nasty
- stack errors that may pass unnoticed otherwise.
-
- The best place for function prototypes is in an include file. You can create an
- include file full of frequently used function prototypes and data structures
- and include it at the beginning of your asm source code.
-
- You call the API function by using "invoke" keyword:
-
- invoke ExitProcess, 0
-
- INVOKE is really a kind of high-level call. It checks number and types of
- parameters and pushes parameters on the stack according to the specified
- calling convention (in this case, stdcall). By using INVOKE instead of a normal
- call, you can prevent stack errors from incorrect parameter passing. Very
- useful. The syntax is:
-
- INVOKE expression [,arguments]
-
- where expression is a label or function name.
-
- Next we're going to put a message box in our program. Its function declaration
- is:
-
- int WINAPI MessageBoxA(HWND hwnd, LPCSTR lpText, LPCSTR lpCaption, UINT
- uType);
-
- -hwnd is the handle to parent window
- -lpText is a pointer to the text you want to display in the client area of the
- message box
- -lpCaption is a pointer to the caption of the message box
- -uType specifies the icon and the number and type of buttons on the message
- box
-
- Under Win32 , HWND, LPCSTR, and UINT are all 32 bits in size.
-
- Let's modify msgbox.asm to include the message box.
-
- .386
- .model flat, stdcall
- ExitProcess PROTO :DWORD
- MessageBoxA PROTO :DWORD, :DWORD, :DWORD, :DWORD
- .data
- MsgBoxCaption db "Our First Program",0
- MsgBoxText db "Win32 Assembly is Great!",0
- .const
- NULL equ 0
- MB_OK equ 0
- .code
- Main:
- INVOKE MessageBoxA, NULL, ADDR MsgBoxText, ADDR MsgBoxCaption, MB_OK
- INVOKE ExitProcess, NULL
- end Main
-
- Assemble it by:
- ml /c /coff /Cp msgbox.asm
- link /SUBSYSTEM:WINDOWS /LIBPATH:c:\masm\lib msgbox kernl32.lib
- user32.lib
-
- You have to include user32.lib in your Link parameter, since link info of
- MessageBoxA is in user32.lib.
-
- You'll see a message box displaying the text "Win32 Assembly is Great!". Let's
- look again at the source code:
-
- We define two zero-terminated strings in .data section. Remember that all
- strings in Windows must be terminated with zero (ASCIIZ).
-
- We define two constants in .const section. We use constants to improve the
- clarity of the source code.
-
- Look at the parameters of MessageBoxA. The first parameter is NULL. This
- means that there's no window that *owns* this message box.
-
- The operator "ADDR" is used to pass the address of the label to the function.
- This operator is specific to MASM. No TASM-equivalent exists. It functions like
- "OFFSET" operator but with some differences:
- 1. It doesn't accept forward reference. If you want to use "ADDR foo",
- you have to declare "foo" before using ADDR operator.
- 2. It can be used with a local variable. A local variable is the
- variable that is created on the stack. OFFSET operator cannot be
- used in this situation because the assembler doesn't know the true
- address of the local variable at assemble time.
-
-
- ::/ \::::::.
- :/___\:::::::.
- /| \::::::::.
- :| _/\:::::::::.
- :| _|\ \::::::::::.
- :::\_____\:::::::::::........................THE.C.STANDARD.LIBRARY.IN.ASSEMBLY
- The _itoa, _ltoa and _ultoa functions
- by Xbios2
-
-
- ATTENTION I:
- This is based on Borland's C++ 4.02. Whenever possible I've checked it with any
- other library / program containing the specific functions, but differences may
- exist between this and your version of C. Also this is strictly 32-bit code,
- Windows compiler. No DOS or UNIX.]
-
- ATTENTION II:
- Size comparisons are extremely easy to do. Speed comparison's aren't. The diff-
- erences in speed I give are based on RDTSC timings, but they DON'T take into
- account extreme cases. That's why I don't give exact clock cycles. Of course if
- you need exact clock cycles for your Pentium II, you can always buy me one :)
-
-
- The C language offers three functions to convert an integer to ASCII:
-
- char *itoa(int value, char *string, int radix);
- char *ltoa(long value, char *string, int radix);
- char *ultoa(unsigned long value, char *string, int radix);
-
- _itoa and _ltoa do _exactly_ the same thing. This is because an integer _is_ a
- long in 32-bit code. Yet they are different: _itoa has some _completely_
- useless code in it (in 16bit this code would sign-extend value if radix=10).
- Yet the result is always the same, so _ltoa from here on means both _ltoa and
- _itoa. _ultoa is exactly the same as _ltoa and _itoa, except when radix=10 and
- value < 0.
-
- Anyway all these functions call this function:
-
- ___longtoa(value, *string, radix, signed, char10)
-
- The first three parameters are passed 'as is', signed is set to 1 by _ltoa if
- radix=10 else it is set to 0 and char10 is the character that corresponds to 10
- if radix>10, and is always set to 'a' (___longtoa is also used by printf, which
- has an option to have uppercase chars in Hex).
-
- ___longtoa does the following (and it does it with badly written code):
-
- 1. Checks that 2<=radix<=36, if it isn't returns '0'
- 2. If signed=1 and value<0 add a '-' to the string and neg the value
- 3. Loop1: create a pseudo-string in the stack, reversed
- 4. Loop2: convert and copy the pseudo-string into string
-
- The check on radix is necessary because:
- radix=0 would generate an INT0 (divide by zero)
- radix=1 would put the program in an infinite loop, destroying the stack
- radix=37 for value=36 would return '}', the character after 'z'
-
- The two loops are necessary because of the way the conversion is done (see code
- later). To implement a single-loop conversion, the number of digits should be
- calculated in advance, which results in less efficient code (the number of
- digits in value is n=(int)(log(value)/log(radix))+1, but using one more loop is
- much faster).
-
- Including the disassembly of C's functions would create a really large article,
- and anyway they're just examples of really bad code. So straight to the result:
-
- ltoa proc
- cmp dword ptr [esp+0Ch], 10
- sete ch
- mov cl, 'a'-'0'-10
- jmp short longtoa
-
- ultoa:
- mov cx, 'a'-'0'-10
-
- longtoa:
- push ebx
- push edi
- push esi
- sub esp, 24h
- mov ebx, [esp+3Ch] ; radix
- mov eax, [esp+34h] ; value
- mov edi, [esp+38h] ; string
- cmp ebx, 2
- jl short _ret
- cmp ebx, 36
- jg short _ret
- or eax, eax
- jge short skip
- cmp byte ptr ch, 0 ; _ltoa ?
- jz short skip
- mov byte ptr [edi], '-'
- inc edi
- neg eax
- skip: mov esi, esp
-
- loop1: xor edx, edx
- div ebx
- mov [esi], dl
- inc esi
- or eax, eax
- jnz loop1
-
- loop2: dec esi
- mov al, [esi]
- cmp al, 10
- jl short nochar
- add al, cl
- nochar: add al, '0'
- stosb
- cmp esi, esp
- jg short loop2
-
- _ret: mov byte ptr [edi], 0
- mov eax, [esp+38h]
- add esp, 24h
- pop esi
- pop edi
- pop ebx
- ret
- ltoa endp
-
- This is a 3 into 1 procedure. ltoa and ultoa take the same parameters as the
- standard C functions. longtoa was changed to take from the stack the same
- parameters as ltoa and ultoa, while signed and char10 are passed in CH and CL
- respectively. This way ltoa and ultoa 'see' longtoa as 'their' code, not as a
- different procedure (this is to avoid a common problem in C, procedures that
- just 'forward' their parameters to another function).
-
- This code compiles to 102 bytes (and it could be optimized to gain some more
- bytes) whereas the standard C code takes 270 bytes. Specifically:
-
- function C size Asm size
- ------------------------------
- itoa 60 0
- ltoa 40 12
- ultoa 27 4
- longtoa 143 86
- ------ ------
- total 270 102
-
- It also runs 2x faster than ltoa. And of course, this is a fully C-compatible
- version of ltoa and ultoa. Of course it can be changed from C-compatible to
- suit specific needs (e.g make it stdcall instead of cdecl, or if speed and size
- are needed remove the check for the radix, and so on...)
-
- Anyway, it is rather strange that you'll ever use values of radix other than 2,
- 8, 10 or 16. So if speed or size is of essence, a better, more specific routine
- can be written. For example, consider this routine which stores the value of
- EAX as a binary number at the address specified by EDI:
-
- ultob proc
- mov ecx, 32
- more1: shl eax, 1
- dec ecx
- jc more2
- jnl more1
- more2: setc dl
- add dl, '0'
- shl eax, 1
- mov [edi], dl
- inc edi
- dec ecx
- jnl more2
- mov [edi], al
- ret
- ultob endp
-
- This runs 14x faster than C ltoa, and 7x faster than Asm ltoa, and is only 29
- bytes long. But this article is long enough, so wait for another article on
- specific 'ltoa' functions (who knows, maybe if I decide to write a 'printf'
- function in Asm, which would use them...).
-
-
- ::/ \::::::.
- :/___\:::::::.
- /| \::::::::.
- :| _/\:::::::::.
- :| _|\ \::::::::::.
- :::\_____\:::::::::::............................................THE.UNIX.WORLD
- x86 ASM Programming for Linux
- by mammon_
-
-
- Essentially this article is an excuse to combine two of my favorite coding
- interests: the Linux operating system and assembly language programming. Both
- of these need (or should need) no introduction; like Win32 assembly, Linux
- assembly runs in 32-bit protected mode...however it has the distinct advantage
- of allowing you to call the C standard library functions as well as any of the
- usual Linux "shared" library functions. I have begun with a brief introduction
- on compiling assembly language programs in Linux; for greater readability you
- may want to skip over this to the "Basics" section.
-
-
- Compiling And Linking
- ---------------------
- The two main assemblers for Linux are Nasm, the (free) Netwide Assembler, and
- GAS, the (also free) Gnu Assembler which is integrated into GCC. I will focus
- on Nasm in this article and leave GAS for a later date, as it uses the AT&T
- syntax and thus would require a lengthy introduction.
-
- Nasm should be invoked with the ELF format option ("nasm -f elf hello.asm");
- the resulting object is linked with GCC ("gcc hello.o") to produce the final
- ELF binary. The following script can be used to compile ASM modules; I wrote
- it to be very simple, so all it does is take the first filename passed to it
- (I recommend naming it with a ".asm" extension), compile it with nasm, and
- link it with gcc.
-
- #!/bin/sh
- # assemble.sh =========================================================
- outfile=${1%%.*}
- tempfile=asmtemp.o
- nasm -o $tempfile -f elf $1
- gcc $tempfile -o $outfile
- rm $tempfile -f
- #EOF ==================================================================
-
-
- The Basics
- ----------
- It is best, of course, to start off with an example before launching into the
- OS details. Here is a very basic, "hello-world"-style program:
- ; asmhello.asm ========================================================
- global main
- extern printf
-
- section .data
- msg db "Helloooooo, nurse!",0Dh,0Ah,0
- section .text
- main:
- push dword msg
- call printf
- pop eax
- ret
- ; EOF =================================================================
- A quick rundown: the "global main" must be declared global--and since we are
- using the GCC linker, the entrypoint must be named "main"--for the OS loader.
- The "extern printf" is simply a declaration for the call later in the program;
- note that this is all that is needed; the parameter sizes do not need to be
- declared. I have sectioned this example into the standard .data and .text
- sections, though this is not strictly necessary--one could get by with only a
- .text segment, just as in DOS.
-
- In the body of the code, note that you must push the parameters to the call,
- and in Nasm you must declare the size of all ambiguous (i.e. non-register)
- data: hence the "dword" qualifier. Note that just as inother assemblers, Nasm
- assumes that any memory/label reference is intended to mean the address of the
- memory location or label, not its contents. Thus, to specify the address of
- the string 'msg' you would use 'push dword msg', while to specify the contents
- of the string 'msg' you would use 'push dword [msg]' (note this will only
- contain the first 4 bytes of 'msg'). As printf requires a pointer to a string,
- we will specify the address of 'msg'.
-
- The call to printf is pretty straightforward. Note that you must clean up the
- stack after every call you make (see below); thus, having PUSHed a dword, I
- POP a dword from the stack into a "throwaway" register. Linux programs end
- simply with a RET to the OS, as each process is spawned from the shell (or PID
- 1 ;) and ends by returning control to it.
-
- Notice that in Linux you use the standard shared libraries that are shipped
- with the OS in lieu of an "API" or Interrupt Services. All external references
- will be taken care of by the GCC linker which takes a lot of the workload off
- the asm coder. Once you get used to the basic quirks, coding assembly in Linux
- is actually easier than on a DOS-based machine!
-
-
- The C Calling Syntax
- --------------------
- Linux uses the C calling convention--meaning that arguments are pushed onto the
- stack in reverse order (last arg first), and that the caller must cleanup the
- stack. You can do this either by popping values from the stack:
- push dword szText
- call puts
- pop ecx
- or by directly modifying ESP:
- push dword szText
- call puts
- add esp, 4
-
- Results from the call are returned in eax or edx:eax if the value is greater
- than 32-bit. EBP, ESI, EDI, and EBX are all saved and restored by the caller.
- Note that you must preserve any other registers you use, as the following will
- illustrate:
- ; loop.asm =================================================================
- global main
- extern printf
- section .text
- msg db "HoodooVoodoo WeedooVoodoo",0Dh,0Ah,0
- main:
- mov ecx, 0Ah
- push dword msg
- looper:
- call printf
- loop looper
- pop eax
- ret
- ; EOF ======================================================================
- On first glance this looks pretty simple: since you are going to use the same
- string on the 10 printf() calls, you do not need to clean up the stack. Yet
- when you compile this, the loop never stops. Why? Because somewhere in the
- printf() call ECX is being used and isn't saved. So to make your loop work
- properly you must save the count value in ECX before the call and restoe it
- afterwards, as so:
- ; loop.asm ================================================================
- global main
- extern printf
-
- section .text
- msg db "HoodooVoodoo WeedooVoodoo",0Dh,0Ah,0
- main:
- mov ecx, 0Ah
- looper:
- push ecx ;save Count
- push dword msg
- call printf
- pop eax ;cleanup stack
- pop ecx ;restore Count
- loop looper
- ret
- ; EOF ======================================================================
-
-
- I/O Port Programming
- --------------------
- But what about direcr hardware access? In Linux you need a kernel-mode driver
- to do anything really tricky...this means your program will end up being two
- parts, one kernel-mode that provides the direct-hardware functionality, the
- other user-mode to provide an interface. The good news is that you can still
- access ports using the IN/OUT commands from a user-mode program.
-
- To access the I/O ports your program must be granted permission by the OS; to
- do that, you must make an ioperm() call. This function can only be called by a
- user with root access, so you must either setuid() the program to root or run
- the program as root. The ioperm() has the following syntax:
-
- ioperm( long StartingPort#, long #Ports, BOOL ToggleOn-Off)
-
- which means that 'StartingPort#' specifies the first port number to access (0
- is port 0h, 40h is port 40h, etc), '#Ports' specifies how many ports to access
- (i.e., 'StartingPort# = 30h' and '#Ports = 10' would provide access to ports
- 30h-39h), and 'ToggleOn-Off' enables access if TRUE (1) or disables access if
- FALSE (0).
-
- Once the call to ioperm() is made, the requested ports may be access as
- normal. The program can call ioperm() any number of times and does not need to
- make a subsequent ioperm() call (though the example below does so) as the OS
- will take care of this.
-
- ; io.asm ====================================================================
- BITS 32
- GLOBAL szHello
- GLOBAL main
- EXTERN printf
- EXTERN ioperm
-
- SECTION .data
- szText1 db 'Enabling I/O Port Access',0Ah,0Dh,0
- szText2 db 'Disabling I/O Port Acess',0Ah,0Dh,0
- szDone db 'Done!',0Ah,0Dh,0
- szError db 'Error in ioperm() call!',0Ah,0Dh,0
- szEqual db 'Output/Input bytes are equal.',0Ah,0Dh,0
- szChange db 'Output/Input bytes changed.',0Ah,0Dh,0
-
- SECTION .text
-
- main:
- push dword szText1
- call printf
- pop ecx
- enable_IO:
- push word 1 ; enable mode
- push dword 04h ; four ports
- push dword 40h ; start with port 40
- call ioperm ; Must be SUID "root" for this call!
- add ESP, 10 ; cleanup stack (method 1)
- cmp eax, 0 ; check ioperm() results
- jne Error
-
- ;---------------------------------------Port Programming Part--------------
- SetControl:
- mov al, 96 ; R/W low byte of Counter2, mode 3
- out 43h, al ; port 43h = control register
- WritePort:
- mov bl, 0EEh ; value to send to speaker timer
- mov al, bl
- out 42h, al ; port 42h = speaker timer
- ReadPort:
- in al, 42h
- cmp al, bl ; byte should have changed--this IS a timer :)
- jne ByteChanged
- BytesEqual:
- push dword szEqual
- call printf
- pop ecx
- jmp disable_IO
- ByteChanged:
- push dword szChange
- call printf
- pop ecx
- ;---------------------------------------End Port Programming Part----------
-
- disable_IO:
- push dword szText2
- call printf
- pop ecx
- push word 0 ; disable mode
- push dword 04h ; four ports
- push dword 40h ; start with port 40h
- call ioperm
- pop ecx ;cleanup stack (method 2)
- pop ecx
- pop cx
- cmp eax, 0 ; check ioperm() results
- jne Error
- jmp Exit
- Error:
- push dword szError
- call printf
- pop ecx
- Exit:
- ret
- ; EOF ======================================================================
-
-
- Using Interrupts In Linux
- -------------------------
- Linux is a shared-library environment running in protected mode, meaning there
- are no interrupt services. Right?
-
- Wrong. I noticed an INT 80 call on some GAS sample source code with the
- comment "sys_write(ebx, ecx, edx)". This function is part of the Linux syscall
- interface, which means that the interrupt 80 must be a gate into the syscall
- services. Poking around in the Linux source code (and ignoring warnings to
- NEVER use the INT 80 interface as the function numbers may be changed at any
- time), I found the "system call numbers" --that is, what function # to pass on
- to INT 80 for each syscall routine-- in the file UNISTD.H. There are 189 of
- them, so I will not list them here...but if you are going to be doing Linux
- assembly, do yourself a favor and print this file out.
-
- When calling INT 80h, eax must be set to the desired function number. Any
- parameters to the syscall routine must be placed in the following registers in
- order:
-
- ebx, ecx, edx, esi, edi
-
- so that parameter one is placed in ebx, parameter 2 in ecx, etc. Note that
- there is no stack used to pass values to a syscall routine. The result of the
- call will be returned in eax.
-
- Other than that, the INT 80 interface is the same as regular calls (only a bit
- more fun ;). The following program demonstrates a simple INT 80h call in which
- a program checks and display its own PID. Note the use of printf() format
- string-- it is best to psuedocode this as a C call first, then make the format
- string a DB and to push each variable passed (%s, %d, etc). The C structure
- for this call would be
-
- printf( "%d\n", curr_PID);
-
- Note also that the escape sequences ("\n") are not all that reliable in
- assembly; I had to use the hex values (0Ah,0Dh) for the CR\LF.
-
- ;pid.asm====================================================================
- BITS 32
- GLOBAL main
- EXTERN printf
-
- SECTION .data
- szText1 db 'Getting Current Process ID...',0Ah,0Dh,0
- szDone db 'Done!',0Ah,0Dh,0
- szError db 'Error in int 80!',0Ah,0Dh,0
- szOutput db '%d',0Ah,0Dh,0 ;weird formatting is for printf()
-
- SECTION .text
- main:
- push dword szText1 ;opening message
- call printf
- pop ecx
- GetPID:
- mov eax, dword 20 ; getpid() syscall
- int 80h ; syscall INT
- cmp eax, 0 ; there will never be PID 0 ! :)
- jb Error
- push eax ; pass return value to printf
- push dword szOutput ; pass format string to printf
- call printf
- pop ecx ; cleanup stack
- pop ecx
- push dword szDone ; ending message
- call printf
- pop ecx
- jmp Exit
- Error:
- push dword szError
- call printf
- pop ecx
- Exit:
- ret
- ; EOF =====================================================================
-
-
- Final Words
- -----------
- Most of the trouble is going to come from getting used to Nasm itself. While
- nasm does come with a man page, it does not by default install it, so you must
- move it (cp or mv) from
- /usr/local/bin/nasm-0.97/nasm.man
- to
- /usr/local/man/man1/nasm.man
- The formatting is a little messed up, but that is easily fixed using the nroff
- directives. It still does not give you the entire Nasm documentation, however;
- for that, copy nasmdoc.txt from
- /usr/local/bin/nasm-0.97/doc/nasmdoc.txt
- to
- /usr/local/man/man1/nasmdoc.man
- Now you cam invoke the nasm man page with 'man nasm' and the nasm documentation
- with 'man nasmdoc'.
-
- For further information, check out the following:
- Linux Assembly Language HOWTO
- Linux I/O Port Programming Mini-HOWTO
- Jan's Linux & Assembler HomePage (bewoner.dma.be/JanW/eng.html)
-
- Also I owe a bit of thanks to Jeff Weeks at code^x software (gameprog.com/codex)
- for forwarding me a couple of GAS hello-world's in the dark days before I
- found Jan's page.
-
-
- ::/ \::::::.
- :/___\:::::::.
- /| \::::::::.
- :| _/\:::::::::.
- :| _|\ \::::::::::.
- :::\_____\:::::::::::...........................................ISSUE.CHALLENGE
- 11-byte Program Displays Its Command-Line
- by Xbios2
-
-
- The Challenge
- -------------
- Write an 11-byte program that displays its command line.
-
-
- The Solution
- ------------
- Before saying that these programs won't work, try them. Some of them work only
- after you've run them twice. Anyway, they' ve been tested both under Windows
- and plain DOS and they work. Believe it or not, these are the first programs
- I've ever written in DOS, so I just tried various ideas until some worked, even
- thought I thought they wouldn't... :)
-
- The command line in DOS is found in the PSP (Program Segment Prefix) which in
- .COM files occupies the first 100h bytes in the segment. At offset 80h, a
- <count, char> string (first byte is length of string, and n bytes follow)
- contains everything typed after the filename. The last character in this string
- is a CR (carriage return).
-
- The requested program should be composed of three parts:
-
- 1. set up pointers to data
- 2. display data
- 3. exit
-
- Actually all the following programs DON'T include part 3, but read on. The
- data (command line) can be printed either as a single string, or character by
- character.
-
-
- APPROACH 1: Print single string
- -------------------------------
- For the first approach there are two interrupts:
- 1. INT 21, 9 ; write $ terminated string
- 2. INT 21, 40 ; write to file using handle
-
- For the first case, part 2 would be:
- mov ah, 9
- mov dx, 81h
- int 21h
- that makes 7 bytes, leaving only 4 bytes to replace the last CR with a '$',
- which are too few. (Actually, if the user would type a $ as the last character
- in the comand line, this would make the smallest possible program.) The short-
- est program I managed to write is:
- shr si,1 ; D1 EE
- lodsb ; AC
- push si ; 56
- add si,ax ; 03 F0
- mov byte ptr [si],'$' ; C6 04 24
- xcgh bp,ax ; 95
- pop dx ; 5A
- int 21 ; CD 21
-
- For the second case, the smallest program would be this:
- ; Solution I
- mov dx, 81h ; BA 81 00
- mov cl, ds:[80h] ; 8A 0E 80 00
- mov ah, 40h ; B4 40
- int 21h ; CD 21
-
- The first two lines are part 1 (set up pointers) and the other two are part 2
- (display string). If you think that something is missing you're right: we don't
- set BX (the handle).
-
-
- APPROACH 2: Print char by char
- ------------------------------
- For the second approach there are two interrupts:
- 1. INT 21, 2 ; write char in dl
- 2. INT 29 ; write char in al
-
- Of course the second interrupt is better, since there is no need to load ah
- with a function value. In addition, INT 29 reads the char from AL, so it can be
- used together with LODSB.
-
- The first way to implement this approach is to minimize part 2 (display loop).
- A program that does this is the following:
- ; Solution II
- mov si, 80h ; BE 80 00
- lodsb ; AC
- mov cl, al ; 8A C8
- more: lodsb ; AC
- int 29h ; CD 29
- loop more ; E2 FB
-
- This program printed CX characters. The second way to print the string is to print up to the CR. Here is how:
- ; Solution III
- mov si, 81h ; BE 81 00
- more: lodsb ; AC
- int 29h ; CD 29
- cmp al, 13 ; 3C 0D
- jne more ; 75 F9
- nop ; 90
-
- Yes, the last instruction IS a NOP. So we have an 11-byte program that works,
- and even has a NOP in it. Removing the NOP creates an even crazier program that
- is 10 bytes long, displays it's command line AND waits for a key press before
- terminating... Actually solution II, by substituting MOV SI,80h with SHR SI,1,
- does the same thing (10 bytes that display the command line and wait for the
- user to press a key).
-
- BTW: I really don't know why these programs work, though I have one or two
- theories...
-
-
- Next Issue Challenge
- --------------------
- Write the smallest possible PE program (win32) that outputs it's command line.
-
-
- ::/ \::::::.
- :/___\:::::::.
- /| \::::::::.
- :| _/\:::::::::.
- :| _|\ \::::::::::.
- :::\_____\:::::::::::.......................................................FIN
-
-
-
-